1 Identifying likely duplicates by record linkage in a survey of prostitutes
نویسندگان
چکیده
1.1 Concern about duplicates in an anonymous survey The Los Angeles Women's Health Risk Study (LAWHRS) was a survey of female street prostitutes in Los Angeles County that aimed to provide insight into the evolution of the AIDS epidemic in the early 1990's (Kanouse et al. 1999). Goals of the study included estimating the size of the female street prostitute population in Los Angeles, determining seroprevalence of the HIV
منابع مشابه
Identifying Person Duplicates of Short Geographic Distance by Computer Matching
The Census Bureau conducted evaluations of person duplication in Census 2000. Duplicates of short geographic distances were identified by both clerical and computer matching. The evaluations showed that for these short distance duplicates that the computer matching algorithms were not able to find all of the duplicates identified by the clerks. However, the computer matching algorithms in the p...
متن کاملData Quality: Automated Edit/Imputation and Record Linkage
Statistical agencies collect data from surveys and create data warehouses by combining data from a variety of sources. To be suitable for analytic purposes, the files must be relatively free of error. Record linkage (Fellegi and Sunter, JASA 1969) is used for identifying duplicates within a file or across a set of files. Statistical data editing and imputation (Fellegi and Holt, JASA 1976) are ...
متن کاملUnsupervised duplicate detection using sample non-duplicates
The problem of identifying objects in databases that refer to the same real world entity, is known, among others, as duplicate detection or record linkage. Objects may be duplicates, even though they are not identical due to errors and missing data. Traditional scenarios for duplicate detection are data warehouses, which are populated from several data sources. Duplicate detection here is part ...
متن کاملRLT-S: A Web System for Record Linkage
BACKGROUND Record linkage integrates records across multiple related data sources identifying duplicates and accounting for possible errors. Real life applications require efficient algorithms to merge these voluminous data sources to find out all records belonging to same individuals. Our recently devised highly efficient record linkage algorithms provide best-known solutions to this challengi...
متن کاملProbabilistic Linkage of Persian Record with Missing Data
Extended Abstract. When the comprehensive information about a topic is scattered among two or more data sets, using only one of those data sets would lead to information loss available in other data sets. Hence, it is necessary to integrate scattered information to a comprehensive unique data set. On the other hand, sometimes we are interested in recognition of duplications in a data set. The i...
متن کامل